High-performance robust speech recognition using stereo training data
نویسندگان
چکیده
We describe a novel technique of SPLICE for highperformance robust speech recognition. It is an efficient noise reduction and channel distortion compensation technique that makes effective use of stereo training data. In this paper, we present a version of SPLICE using the minimum-meansquareerror decision, and describe an extension by training clusters of HMMs with SPLICE processing. Comprehensive results using a Wall Street Journal large vocabulary recognition task and with a wide range of noise types demonstrate superior performance of the SPLICE technique over that under noisy matched conditions (13% word error rate reduction). The new technique is also shown to consistently outperform the spectralsubtraction and the fixed CDCN noise reduction techniques. It is currently being integrated into the Microsoft MiPad, a new generation PDA prototype.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملFeature Normalisation for Robust Speech Recognition
Speech recognition system performance degrades in noisy environments. If the acoustic models (HMMs) for speech are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihood values and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address thi...
متن کاملAdaptive stereo-based stochastic mapping
Stereo-based stochastic mapping (SSM) is a technique based on constructing a Gaussian mixture model for the joint distribution of stereo data. This paper considers the use of SSM for noise robust speech recognition, in which clean and noisy speech features form the stereo data. The Gaussian mixture model, whose parameters are estimated from the observed stereo features during training time, is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001